Skip to content

feat: add trace stats computation#51

Open
duncanpharvey wants to merge 23 commits intomainfrom
duncan-harvey/trace-stats
Open

feat: add trace stats computation#51
duncanpharvey wants to merge 23 commits intomainfrom
duncan-harvey/trace-stats

Conversation

@duncanpharvey
Copy link
Copy Markdown
Collaborator

@duncanpharvey duncanpharvey commented Dec 3, 2025

What does this PR do?

  • Add agent side trace stats computation in the Serverless Compatibility Layer
  • Disables tracer computed stats

Motivation

Some tracers support trace stats computation, but all don't support it, or have it disabled by default. Trace stats computation in the Serverless Compatibility Layer allows all environments supported by this package to have trace stats.

https://datadoghq.atlassian.net/browse/SVLS-8032

Additional Notes

Describe how to test/QA your changes

@duncanpharvey duncanpharvey force-pushed the duncan-harvey/trace-stats branch 2 times, most recently from a7773cc to 070ae12 Compare December 11, 2025 14:51
@duncanpharvey duncanpharvey force-pushed the duncan-harvey/trace-stats branch from 2056c19 to d5a3f08 Compare December 17, 2025 22:16
@duncanpharvey duncanpharvey force-pushed the duncan-harvey/trace-stats branch 2 times, most recently from 006cb8e to 4142915 Compare March 27, 2026 14:16
@duncanpharvey duncanpharvey force-pushed the duncan-harvey/trace-stats branch from 5a64507 to 3562c93 Compare March 31, 2026 17:50
@duncanpharvey duncanpharvey marked this pull request as ready for review April 1, 2026 20:48
@duncanpharvey duncanpharvey requested a review from a team as a code owner April 1, 2026 20:48
Copilot AI review requested due to automatic review settings April 1, 2026 20:48
@duncanpharvey duncanpharvey requested a review from a team as a code owner April 1, 2026 20:48
@duncanpharvey duncanpharvey requested review from Lewis-E and litianningdatadog and removed request for a team April 1, 2026 20:48
@duncanpharvey
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds trace stats computation capabilities to the Serverless Compatibility Layer. It enables the system to compute and send trace statistics when tracers haven't already computed them, enhancing stats availability across all supported serverless environments.

Changes:

  • Adds a new StatsConcentratorService module that manages span concentration and stats bucket generation using the libdd-trace-stats library
  • Adds a new StatsGenerator module that extracts metadata and spans from traces and sends them to the stats concentrator
  • Extends ServerlessTraceProcessor to optionally call the stats generator when the tracer hasn't already computed stats
  • Extends ServerlessStatsFlusher to flush both tracer-computed and serverless-computed stats
  • Adds integration tests to verify stats computation and to verify stats are skipped when client has already computed them
  • Adds configuration options for service, env, and version fields, and an environment variable to enable/disable stats computation

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/stats_concentrator_service.rs New module implementing stats concentration service with metadata tracking and flush logic
src/stats_generator.rs New module for extracting trace information and sending to stats concentrator
src/trace_processor.rs Modified to conditionally invoke stats generator based on client-computed-stats header
src/stats_flusher.rs Modified to flush both channel and concentrator-based stats with proper flow control
src/config.rs Added service, env, version fields and corresponding environment variable reading
src/main.rs Added initialization of stats computation with environment variable toggle
src/lib.rs Exported new modules
src/mini_agent.rs Minor code style improvements (let-else pattern)
Cargo.toml Added thiserror and libdd-trace-stats dependencies
tests/integration_test.rs Added comprehensive integration tests for stats generation scenarios
tests/common/mock_server.rs Updated to handle stats intake with proper 202 status code response
tests/common/helpers.rs Extended send_tcp_request to support additional headers

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ac8d049051

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/datadog-trace-agent/src/stats_concentrator_service.rs
Comment thread crates/datadog-trace-agent/src/stats_concentrator_service.rs Outdated
Comment thread crates/datadog-trace-agent/src/stats_concentrator_service.rs Outdated
Comment thread crates/datadog-trace-agent/src/stats_concentrator_service.rs Outdated
Comment thread crates/datadog-trace-agent/src/stats_concentrator_service.rs
Comment thread crates/datadog-trace-agent/src/stats_concentrator_service.rs Outdated
Comment thread crates/datadog-serverless-compat/src/main.rs Outdated
self.flush_stats(config.clone(), buffer.to_vec()).await;
buffer.clear();
// copy the batch for this flush
let channel_stats = buffer.to_vec();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: LLM recommends let channel_stats = std::mem::take(&mut *buffer); to replace let channel_stats = buffer.to_vec(); buffer.clear(); but that's unrelated to the actual intent of the change here, so.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied this change!

}
Ok(None) => {}
Err(e) => {
error!("Failed to flush stats concentrator: {e}");
Copy link
Copy Markdown
Contributor

@Lewis-E Lewis-E Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a log in a loop here or a error to be caught and have the whole process restarted/shut down? The trace flusher and stats flusher both have a a Err("<this code> task terminated unexpectedly".into());, would be consistent to have that for the concentrator as well.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a task terminated unexpectedly check for the span concentrator service to be consistent with these other background tasks.

@duncanpharvey duncanpharvey force-pushed the duncan-harvey/trace-stats branch from 9e4544f to c2191f8 Compare April 7, 2026 15:50
@duncanpharvey duncanpharvey marked this pull request as draft April 9, 2026 20:51
@duncanpharvey duncanpharvey force-pushed the duncan-harvey/trace-stats branch 3 times, most recently from df2f714 to 3b74133 Compare April 20, 2026 17:14
@duncanpharvey duncanpharvey marked this pull request as ready for review April 20, 2026 17:21
@duncanpharvey duncanpharvey force-pushed the duncan-harvey/trace-stats branch from 3b74133 to e46f752 Compare April 24, 2026 18:23
pub fn new(config: Arc<Config>) -> (Self, StatsConcentratorHandle) {
let (tx, rx) = mpsc::unbounded_channel();
let handle = StatsConcentratorHandle::new(tx);
// TODO: set span_kinds_stats_computed and peer_tag_keys
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will we do this todo?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to work on it this week!

// Drain client stats in buffer and stats from concentrator on interval
_ = interval.tick() => {
let client_stats = std::mem::take(&mut buffer);
if !client_stats.is_empty() || self.stats_concentrator.is_some() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we check for the stats concentrator to be some here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stats flusher is either flushing trace stats received from the tracer in client_stats or trace stats computed by the agent in stats_concentrator. We want to flush if either one of these cases is true. I added more comments to clarify this.

tracer_language: tracer_payload.language_name.clone(),
tracer_version: tracer_payload.tracer_version.clone(),
hostname: String::new(),
service_name: None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the payload doesn't have a service name to pull?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the TracerPayload does not have a service name to pull.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the TracerPayload struct does not have a service name to pull, we can pull the service name from _dd.base_service on the root span.

See #51 (comment)

}
}
} else {
error!("Unsupported tracer payload version. Failed to send trace stats.");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably say which version we saw that we don't know how to handle. i saw a proposal for a new tracer payload version coming down the pike

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the unsupported tracer payload version to the error log.

gh-worker-dd-mergequeue-cf854d Bot pushed a commit to DataDog/libdatadog that referenced this pull request Apr 27, 2026
# What does this PR do?

Derives PartialEq and Eq on TracerMetadata, enabling equality comparisons between instances using `==` and `!=`.

# Motivation

Adding these standard library traits enables equality assertions without requiring callers to implement their own field by field comparison. Needed for DataDog/serverless-components#51 (comment)

# Additional Notes

# How to test the change?

Unit tests


Co-authored-by: duncan.harvey <duncan.harvey@datadoghq.com>
Comment thread crates/datadog-trace-agent/src/config.rs Outdated
Comment thread crates/datadog-trace-agent/src/config.rs Outdated
Comment thread crates/datadog-trace-agent/src/trace_processor.rs Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants